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Abstract 

Objectives To investigate the proportion of original studies included in 
systematic reviews and meta-analyses on the diagnostic accuracy of 
screening tools for depression that appropriately exclude patients who 
already have a diagnosis of or are receiving treatment for depression 
and to determine whether these systematic reviews and meta-analyses 
evaluate possible bias from the inclusion of such patients. 

Design Systematic review. 

Data sources Medline, PsyclNFO, CINAHL, Embase, ISI, SCOPUS, 
and Cochrane databases were searched from 1 January 2005 to 29 
October 2009. 

Eligibility criteria for selecting studies Systematic reviews and 
meta-analyses in any language that reported on the diagnostic accuracy 
of screening tools for depression. 

Results Only eight of 197 (4%) unique publications from 17 systematic 
reviews and meta-analyses specifically excluded patients who already 
had a diagnosis of or were receiving treatment for depression. No 
systematic reviews or meta-analyses commented on possible bias from 
the inclusion of such patients, even though 10 reviews used quality 
assessment tools with items to rate risk of bias from composition of the 
sample of patients. 

Conclusions Studies of the accuracy of screening tools for depression 
rarely exclude patients who already have a diagnosis of or are receiving 
treatment for depression, a potential bias that is not evaluated in 
systematic reviews and meta-analyses. This could result in inflated 



estimates of accuracy on which clinical practice and preventive care 
guidelines are often based, a problem that takes on greater importance 
as the rate of diagnosed and treated depression in the population 
increases. 

Introduction 

Depression is a common and disabling condition, 1 and improving 
care has been prioritised. Routine screening for depression is 
one solution that has been proposed. Depression screening 
involves the use of screening tools to identify patients who 
might have depression but who are not seeking treatment for 
symptoms and whose depression is not otherwise recognised 
by their physicians so that they can be further assessed and, if 
appropriate, treated. 2 3 Screening for depression has been 
recommended in several medical settings, including 
cardiovascular care, 4 perinatal care, 5 " 7 oncological care, 8 and 
primary care, 9 although no clinical trial has found better 
depression outcomes for screened versus unscreened patients 
when the same treatment and care resources are potentially 
available to both groups. 10 " Screening for depression can 
identify patients with depression who might otherwise go 
undetected, but it can also lead to misdiagnosis, the identification 
of patients as being depressed who are not, and overdiagnosis, 
which occurs when some patients with mild conditions are 
identified as depressed and exposed to the risk of labelling and 
treatment, even when the condition might not cause measurable 
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morbidity or mortality. Recently, a report from the National 
Institute for Health and Clinical Excellence (NICE)" noted a 
lack of evidence for benefit from depression screening and, 
rather than routine screening, recommended case identification 
strategies to identify depression among high risk groups of 
patients or patients otherwise identified by physicians as possibly 
having depression. 

A great deal of research has been conducted to determine the 
diagnostic accuracy of depression screening tests in different 
clinical settings. Based on data from such studies, expert panels 
have considered the risks and benefits of screening and issued 
recommendations to screen for depression in various settings. 9 11 
Diagnostic or screening tests, however, are useful only to the 
extent that they distinguish between disordered and 
non-disordered states that are not otherwise obvious to 
clinicians 12 and if they are accurate across the spectrum of 
patients who will be assessed in clinical practice. 12 " 18 

The term "spectrum effect" has been used to describe variations 
in test performance that sometimes occur across subgroups of 
patients that differ in demographic or clinical features. Spectrum 
effects raise questions about the generalisability of study results 
to specific populations of patients that might differ in important 
ways from study samples. 19 The term "spectrum bias" is related 
and also describes situations in which the accuracy of a test is 
heterogeneous across subgroups of patients. Spectrum bias is 
said to be present when a study samples preferentially from 
certain portions of the patient spectrum but provides a global 
estimate of accuracy that could misrepresent what would be 
experienced in actual practice. 12 " 19 Estimates of diagnostic 
accuracy that are based on case-control designs and whose 
samples include only obvious cases and healthy controls, for 
instance, have been shown to substantially overestimate 
diagnostic accuracy. 13 14 18 

Self reported depression questionnaires are used for various 
purposes (such as screening for unidentified cases, tracking 
severity of symptoms, detecting relapse). For the purpose of 
screening, which involves the identification of cases not 
previously recognised, if individuals who already have a 
diagnosis of depression are not specifically excluded from 
studies assessing the diagnostic accuracy of depression screening 
tools, examined cohorts will have a greater prevalence and 
severity of depression than if only individuals without clinically 
recognised depression were screened. Not excluding patients 
who already have a diagnosis would, in turn, lead to 
determinations of screening accuracy and new case yield that 
are inflated compared with what would be achieved if the 
instrument were used to screen patients in clinical practice. 12 " 18 

Systematic reviews and meta-analyses are highly cited and are 
prioritised in grading evidence for practice guidelines. 20 21 If 
studies of the diagnostic accuracy of depression screening tools 
that include patients who already have a diagnosis or are 
receiving treatment are included in systematic reviews and 
meta-analyses without adjustment for potential bias, these 
reviews could provide misleading accuracy estimates, thereby 
misleading calculations of risk-benefit by expert panels and, 
thus, clinicians. 

We evaluated the proportion of studies included in systematic 
reviews and meta-analyses of the diagnostic accuracy of 
depression screening tools that excluded patients who already 
had a diagnosis of or were receiving treatment for depression. 
We also assessed whether authors of systematic reviews and 
meta-analyses noted the possibility of spectrum bias from the 
inclusion of such patients in the original research studies they 
reviewed. We hypothesised that few studies of depression 



screening tools would exclude such patients and that systematic 
reviews and meta-analyses would not consider spectrum bias 
from their inclusion. 

Methods 

Selection of systematic reviews and 
meta-analyses 

We searched Medline, PsycINFO, CINAHL, Embase, ISI, 
SCOPUS, and Cochrane databases from 1 January 2005 to 29 
October 2009 for systematic reviews and meta-analyses of the 
diagnostic accuracy of depression screening tools. We restricted 
the search to this period to obtain recent systematic reviews and 
meta-analyses that reflect relatively current practice. The search 
terms used were ((systematic review OR meta-analysis) AND 
(screening OR sensitivity OR specificity) AND depression). 
Eligible articles included systematic reviews and meta-analyses 
in any language published in final form or on the internet before 
final publication that reviewed the accuracy of screening tools 
for depression compared with a diagnosis of depression. 
Depression screening tools included any self report measure 
used to attempt to identify patients with depression. We included 
systematic reviews and meta-analyses that reviewed diagnostic 
accuracy and other psychometric characteristics of depression 
questionnaires (such as validity and reliability) but extracted 
data only on diagnostic accuracy. We excluded systematic 
reviews and meta-analyses that compared scores only on self 
report screening tools with classifications of depression based 
on cut offs from other self report screening tools but not a 
diagnosis of depression. Two investigators reviewed systematic 
reviews and meta-analyses for eligibility independently. If either 
reviewer deemed a systematic review or meta-analysis 
potentially eligible based on a review of the title and abstract, 
we carried out a full text review of the systematic review or 
meta-analysis. Any disagreement between reviewers after full 
text review was resolved by consensus after consultation with 
an independent third reviewer. Chance corrected agreement 
between reviewers was assessed with Cohen's k. 

Data extraction 

Two investigators independently extracted and entered on a 
standardised spreadsheet data items from the systematic reviews 
and meta-analyses, as well as from the original studies included 
in the reviews, with discrepancies resolved by consensus. For 
each systematic review or meta-analysis, they recorded whether 
or not original studies mentioned possible bias because of the 
inclusion of patients who already had a diagnosis of or were 
receiving treatment for depression. Investigators also determined 
whether or not each systematic review or meta-analysis included 
an assessment of the quality of included diagnostic accuracy 
studies. If so, they recorded the tool that was used to do this and 
whether or not the tool included an evaluation of the risk of 
spectrum bias. Investigators also recorded the impact factor of 
the journal in which each systematic review or meta-analysis 
was published, using the impact factor for the year of 
publication. 22 In addition, they reviewed the introduction and 
discussion sections and recorded the described purpose for which 
accuracy of the screening tool was being assessed (such as 
screening or identification of new cases, monitoring progress 
of treatment, detection of relapse). 

Original diagnostic accuracy studies included in the systematic 
reviews and meta-analyses were classified as having excluded 
patients who already had a diagnosis of or were receiving 
treatment for depression if the authors of the study specifically 
indicated this in the exclusion criteria. If studies did not 
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specifically indicate that such patients were excluded they were 
classified as having included them. 

For each systematic review or meta-analysis, and overall, we 
determined the number of unique publications on the diagnostic 
accuracy of depression screening tools, as well as the number 
of unique cohorts of patients. We assessed the number of 
publications and the number of cohorts because, in some cases, 
there were multiple publications from the same cohort. This 
occurred, for instance, when different publications reported 
results from different screening tools or criterion standards with 
the same group of patients, when one or more publications 
reported on a subset of the sample from another publication, or 
when the same patients were assessed at different time points 
(such as during pregnancy and after delivery). Identification of 
different publications from the same cohort was done by cross 
referencing authors and coauthors, characteristics of patients, 
and countries in which the research was conducted. Verification 
was done by comparing information in the publications. Cohort 
status was coded conservatively in that publications that seemed 
to be from the same cohort were coded as such, even if this 
could not be confirmed with 100% certainty. 

We did not publish or register a review protocol for this study. 
All methods were determined a priori with the exception of 
reviewing the introduction and discussion sections to record the 
described purpose for which the accuracy of depression 
screening tool was being assessed. This additional step was 
added to the study methods after data extraction and tabulation 
of results to clarify whether the intention of the included 
systematic reviews and meta-analyses was to assess diagnostic 
accuracy for identification of new cases versus other possible 
uses of depression symptom questionnaires. 

Results 
Search results 

The electronic database search yielded 1216 unique titles and 
abstracts for review. Of these, 1 160 were excluded after review 
of titles and abstracts because they did not report results from 
a systematic review or meta-analysis or because they reported 
data from a systematic review or meta-analysis that was not 
related to the diagnostic accuracy of a depression screening tool. 
Of the 56 articles that underwent full text review, we excluded 
39, leaving 17 eligible systematic reviews and meta-analyses 
(figure ,;). Chance corrected agreement on inclusion and 
exclusion decisions between reviewers, as assessed with the 
Cohen's k, was 0.95. 

Table 1 shows the characteristics of selected systematic reviews 
and meta-analyses j . Of the 17 systematic reviews and 
meta-analyses included, 10 were systematic reviews, 23 " 32 and 
seven were meta-analyses. 33-39 The systematic reviews and 
meta-analyses included between two and 63 original studies 
and were published in a wide range of journals in terms of 
impact factor. Two meta-analyses assessed the nine item 
depression scale of the patient health questionnaire (PHQ-9) 33 39 ; 
one systematic review 23 and two meta-analyses 37 38 evaluated 
the geriatric depression scale; seven systematic reviews 24 26 27 29-32 
and one meta-analysis 36 assessed depression screening tools, 
generally, in defined medical populations; two systematic 
reviews assessed specific screening tools, other than the patient 
health questionnaire or geriatric depression scale, in defined 
patient populations 25 28 ; and two meta-analyses assessed brief 
screening tools (for example, fewer than five items) in primary 
care 34 and palliative care. 35 All 17 systematic reviews and 
meta-analyses described the purpose of the review as related to 
determining diagnostic accuracy for new case detection by 



screening, and none discussed how their results might apply to 
other uses of depression screening tools (such as monitoring 
progress of treatment, detection of relapse). 

Inclusion or exclusion of patients who already 
had a diagnosis or were receiving treatment 

The 17 systematic reviews and meta-analyses included a total 
of 197 unique publications on the diagnostic accuracy of 
screening tools for depression in 170 unique cohorts of patients. 
The diagnostic accuracy studies examined more than 25 different 
screening tools in a wide range of patients (see appendix 1 on 
bmj.com). Only eight of 197 unique publications (4%) and eight 
of 170 cohorts (5%) specifically excluded patients who already 
had a diagnosis of or were receiving treatment for depression 
(see appendix 1). As shown in table 1, 1 1 23 26 27 30-33 35 37-39 of the 
17 systematic reviews or meta-analyses did not examine a single 
cohort of patients that specifically excluded those who already 
had a diagnosis of or were receiving treatment for depression. 

Table 2JJ shows that only four 40 43 of the eight studies that 
excluded such patients reported the number of patients who 
were excluded because of pre-existing mental health treatment. 
The proportion of patients excluded for this reason was 22% in 
a Veteran's Affairs primary care setting in the United States 
(published in 2004) 43 ; 10% in a 2003 study of patients in general 
practice from New Zealand 42 ; 2% in a 2004 study of postpartum 
women from Turkey 40 ; and 0.2% in a 1996 study of postpartum 
women from Sweden. 41 

Treatment of spectrum bias in systematic 
reviews and meta-analyses 

As shown in table 1, 13 23-25 27 30-36 38 39 of the 17 systematic 
reviews and meta-analyses conducted some form of quality 
assessment of included studies, including two meta-analyses 36 39 
that used the quality assessment for diagnostic accuracy studies 
(QUADAS) tool 44 ; one systematic review 27 that used the 
diagnostic test studies evaluation tool 43 ; one meta-analysis 34 that 
used the Newcastle-Ottawa scale 46 ; two systematic reviews 30 32 
that used methods developed by the US Preventive Services 
Task Force (USPSTF) 47 48 ; one systematic review 31 that based 
quality review on guidelines from the American Academy of 
Neurology 49 ; one systematic review 25 that evaluated quality 
items based on a system from the York Centre for Reviews and 
Dissemination 50 ; one systematic review 24 that used a study 
specific tool based on criteria identified by the Cochrane 
Methods Working Group on Systematic Review of Screening 
and Diagnostic Tests 51 ; one meta-analysis 35 that based quality 
ratings on a published article by Pai et al 52 ; and one systematic 
review 23 and two meta-analyses 33 38 that used ad hoc procedures, 
such as extracting data on one to two items related to study 
quality. 

Of these, 10 systematic reviews or meta-analyses 24 25 27 30-32 34-36 39 
used quality assessment methods that included an assessment 
of spectrum bias. The authors of one of these systematic 
reviews 24 noted study limitations from the lack of non-white 
patients, and the authors of another 32 reported that younger 
children were poorly represented in studies of children and 
adolescents. The authors of one meta-analysis reported that half 
of studies reviewed did not include representative samples but 
did not provide a rationale for this conclusion. 36 The authors of 
another noted the possibility of a "disease progression bias" in 
one study of patients after stroke and indicated that none of the 
other 1 1 studies reviewed had limitations related to composition 
of patients. 39 In one systematic review, one of four included 
studies was downgraded because of the description of the 
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sample, but an explanation was not provided. 27 The authors of 
the five other systematic reviews or meta-analyses that used 
quality assessment methods that included an assessment of 
spectrum bias did not comment specifically on quality ratings 
related to possible spectrum bias. 2 ' 26 27 30 11 

Overall, none of the 17 systematic reviews or meta-analyses 
commented on possible spectrum bias from the inclusion in 
studies of patients who already had a diagnosis of or were 
receiving treatment for depression. 

Discussion 

We found that less than 5% of studies on the diagnostic accuracy 
of depression screening tools appropriately excluded patients 
who already had a diagnosis of or were receiving treatment for 
depression. The importance of this finding relates to the potential 
effect on assessments of the accuracy of depression screening 
instruments and the number of new cases they will uncover and, 
therefore, on their utility in clinical practice. The diagnostic 
accuracy of a screening test is often considered a fixed 
characteristic of a test, but it can vary substantially in 
populations with different clinical features. 16 Studies that have 
examined accuracy of diagnostic tests consistently show that 
increased prevalence or severity of disease in the cohort of 
patients being examined inflates the reported sensitivity of the 
test being assessed. 14 If the accuracy of screening tools for 
depression was studied in a group of patients, some of whom 
had already received a diagnosis for the condition, the 
assessments would be biased by the inclusion of individuals 
with a greater prevalence and severity of depression than if the 
instruments were used in clinical practice to screen patients 
without clinically recognised depression. This would, in turn, 
lead to inflated, and potentially misleading, estimates of 
accuracy on which clinical practice and preventive care 
guidelines are generally based. 

Potential magnitude of problem 

The potential magnitude of this problem grows as the prevalence 
of already diagnosed and treated depression in the population 
increases. 53 54 Estimates of the prevalence of depression in 
primary care range from 5% to 13%, including 6% to 9% among 
adults aged 55 or older. 55 Rates are somewhat higher in patients 
with chronic physical illness. 1 Among adults aged 35 and older 
in the US, rates of antidepressant use increased from 8% to 14% 
from 1996 to 2005, with a third to a half of prescriptions 
specifically for psychiatric problems. 53 Rates of prescriptions 
for antidepressants might be even higher among patients with 
chronic physical disease. Based on provincial data from Ontario, 
Canada, for instance, the rate of antidepressant prescriptions 
within six months of an acute myocardial infarction doubled 
from 8% in 1993 to 16% in 2002 among patients aged 65 and 
older. 56 In a more recent cohort of more than 1200 outpatients 
with stable cardiovascular disease, just under 20% were treated 
with an antidepressant at the time of enrolment in the study. 57 58 
In addition to patients who receive treatment with 
antidepressants, a relatively small percentage of people receive 
psychotherapy for depression without drug treatment, 55 and 
some people are recognised by their physicians as depressed 
but choose not to undergo treatment. 

A recent meta-analysis found that general practitioners correctly 
identify about 50% of patients with depression without the 
assistance of a screening tool. 60 Dichotomising a doctor's 
identification or non-identification of depressive disorders, 
however, could underestimate the degree to which they 
recognise depression. A study of over 700 patients in primary 



care from the US and the Netherlands, for instance, found that 
complete disagreement between physicians' assessments and a 
diagnostic interview for depression was much less common 
than is often thought. 61 In that study, only 27% of false negative 
cases based on physician assessments were true false negatives. 
In most cases of false negatives, physicians recognised 
symptoms of depression but underestimated severity compared 
with the diagnostic interview (40%) or gave another psychiatric 
diagnosis (33%). Thus, in many settings, a substantial proportion 
of depressed patients are recognised as depressed without 
screening, either because they seek treatment for their depression 
or because a healthcare professional otherwise recognises their 
symptoms. Based on reported rates of prescriptions for 
antidepressants and estimates of physicians' ability to recognise 
depression, it could be that as many as half or more of patients 
who are detected as cases in studies assessing the diagnostic 
accuracy of screening tools would not even be screened in 
clinical practice. 

Data are not available that would allow a precise calculation of 
the degree by which studies that fail to exclude patients who 
already have a diagnosis of or are receiving treatment for 
depression might overestimate diagnostic accuracy and the 
number of new patients who would be identified through 
depression screening. Two reviews, however, have reported 
that studies of other types of diagnostic tests that have used 
case-control designs 13 or case-control designs that compared 
severely affected patients and healthy controls 18 substantially 
overestimate diagnostic accuracy (relative diagnostic odds ratios 
3.0 13 and 4.9, 18 respectively). 

Even a relatively small increase in reported diagnostic accuracy 
resulting from the inclusion of patients who already have a 
diagnosis or are receiving treatment would result in a substantial 
overestimate of the positive predictive value and new case yield 
from depression screening compared with what would be 
expected in clinical practice. A systematic review of the 
diagnostic accuracy of depression screening tools in primary 
care found a median sensitivity of 85% and median specificity 
of 74%. 62 Based on this, in a primary care setting with a 
prevalence rate of 10%, 55 32% of all patients would screen 
positive for depression, of whom 27% would be true positive 
cases, equivalent to 9% of all patients screened. If existing 
studies overestimated the sensitivity by even 10% because of 
the inclusion of patients with a diagnosis or being treated 
(relative diagnostic odds ratio 1.9), and it is conservatively 
assumed that physicians recognise 50% of depressed patients 
without screening, the rate of screening with positive results 
would decrease only slightly, from 32% to 27%. Only 14% of 
these, however, would be true positives, and, overall, less than 
4% of patients screened would be newly identified cases of 
depression (see appendix 2 on bmj.com). 

We know of only one study, which was not included in any of 
the systematic reviews or meta-analyses that we reviewed, that 
assessed the yield of screening for depression with and without 
excluding patients with psychiatric disorders already treated 
with psychotropic drugs. 63 In that study of 1 13 women with 
breast cancer, the true positive rate of screening for depression 
fell from 21% to 7% after exclusion of patients who were 
already receiving treatment for depression before screening. 

Our results should be considered in the context of studies that 
have assessed whether screening for depression benefits patients. 
There are at least 1 1 trials in primary care, 10 as well as trials in 
perinatal care, 64 65 and cancer care, 66 that have tested whether 
screening and referral for depression treatment improves 
depression outcomes, and all have had negative results. 
Reflecting this, the US Preventive Services Task Force 
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recommends screening for depression only when it is supported 
by integrated staff assisted depression management 
programmes.'' To our knowledge, only one published research 
study has documented an attempt to screen and provide 
collaborative care, as recommended by the task force, in a 
clinical setting. 67 In that study, from the Netherlands, 1687 high 
risk patients were invited to enrol in a screening trial, 780 
participated, and 71 cases of major depression were detected. 
Of the 7 1 patients identified, 36 were already receiving treatment 
for depression and 18 additional patients refused treatment or 
did not attend their scheduled appointment. Thus, only 17 people 
of 1687 potentially screened started treatment for depression. 

Strengths and limitations of review 

One possible limitation of the current study is that we searched 
for systematic reviews and meta-analyses, rather than for 
original studies, and there are probably many original studies 
on the diagnostic accuracy of depression screening tools that 
were not included. Our purpose, however, was to assess whether 
original studies appropriately excluded patients who already 
had a diagnosis or were receiving treatment and to determine 
whether systematic reviews and meta-analyses reflected potential 
bias from the failure to do this, which required a review of 
reviews. It is unlikely that including additional studies that were 
not listed in recent systematic reviews or meta-analyses would 
have substantively altered the results. 

Another potential limitation is that the proportion of patients 
who already had a diagnosis of or were receiving treatment for 
depression who were inappropriately included in the diagnostic 
accuracy studies reviewed is unknown. Only four of the studies 
that excluded such patients reported the proportion excluded 
for this reason, and this varied widely depending on the setting 
and the time period of the study. It was less than 2% in studies 
that collected data from 10 years ago in Turkey 40 and more than 
15 years ago in Sweden, 41 but about 10% in a 2003 study of 
patients in general practice from New Zealand 42 and just over 
20% in a 2004 study of primary care patients treated in a US 
Veteran's Affairs setting. 43 In addition, the small number and 
substantial heterogeneity of studies that excluded patients who 
already had a diagnosis or were receiving treatment did not 
allow for an assessment of the effect of inclusion and exclusion 
decisions on diagnostic accuracy estimates. On the other hand, 
numerous studies have found that the inclusion of established 
cases among examined cohorts consistently inflates assessments 
of the accuracy of a diagnostic test, 14 and it is likely that this 
would also be the case in studies of depression screening tools. 

Conclusions and policy implications 

The importance of our findings relates to the use of depression 
questionnaires for screening, a procedure conducted to identify 
previously unrecognised cases. 2 3 In clinical practice, depression 
questionnaires are sometimes used for purposes other than 
screening, including monitoring the severity of symptoms in 
patients who already have a diagnosis of depression and 
assessing patients for recurrence of symptoms while they are 
being treated. The introduction and discussion sections of the 
17 systematic reviews and meta-analyses we reviewed indicate 
that all were intended to assess the diagnostic accuracy and 
utility of depression questionnaires for the purpose of 
screening — that is, for identification of new cases. None 
discussed how findings might apply to other possible uses for 
the questionnaires (such as monitoring progress of treatment or 
detection of relapse). In addition, the recommendations that 
have been issued by expert panels regarding depression 



screening in various settings discuss the use of screening 
instruments as a means of identifying new cases. 

Screening for depression is somewhat different from many other 
types of screening in that a history or interview might not 
necessarily be part of the evaluation before a screening tool is 
administered. To illustrate, the US Preventive Services Task 
Force recommends screening for cervical cancer in women who 
have been sexually active and have a cervix. 68 On the other hand, 
such screening is not recommended for women older than 65 
or for women who have recently had a normal result on a smear 
test. This approach to screening is predicated on some "filtering" 
to determine the appropriate individuals or groups to be 
screened. On the other hand, the task force's recommendations 
regarding depression screening 9 focus on issues in healthcare 
systems, such as the availability of staff assisted depression 
care, rather than on any upstream evaluation of patients before 
screening. In clinical settings, screening tools for depression 
might be routinely administered to all patients in the waiting 
room of a hospital, physician's office, or clinic, as has been 
recommended by expert panels. 4 Regardless of whether these 
screening tools are used with or without upstream "filtering" in 
clinical practice, accurate determinations of test characteristics 
that reflect the ability to detect previously unrecognised cases 
can be obtained only if this upstream "filtering" is done in 
studies to exclude patients who already have a diagnosis of 
depression. Our findings show that this is rarely done, and, as 
a result, existing evidence on the accuracy and case yield of 
depression screening tools could substantially overestimate their 
utility in clinical practice. Well designed studies that exclude 
patients who already have a diagnosis of or are receiving 
treatment for depression are needed to generate realistic 
determinations of the accuracy of depression screening tools in 
clinical settings to inform decisions about risks and benefits 
with screening. 
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What is already known on this topic 

The results of studies on the accuracy of screening tools for depression are routinely used by expert panels to make 
decisions about the potential benefits of depression screening 

What this study adds 

Studies of the accuracy of screening tools for depression rarely exclude patients who already have a diagnosis or are 
receiving treatment, a potential bias that is not evaluated in systematic reviews and meta-analyses 

This can result in inflated accuracy and estimates of the yield of new cases on which clinical practice and preventive 
care guidelines are often based, a problem that takes on greater importance as the rate of diagnosed and treated 
depression in the population increases 
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Tables 



Table 1 1 Systematic reviews (SR) and meta-analyses (MA) of diagnostic accuracy of depression screening tools 

Inclusion or 

Cohorts that Quality exclusion of 

excluded assessment diagnosed or 



Study 


Journal 
impact 
factor* 


Screening tool, 
patients/setting 


Review 
type 


Publications 
reviewed 


Cohorts 
reviewed 


diagnosed or 
treated 
patients 


Method of quality 
assessment 


included 
spectrum 
biasf 


treated 
patients 
noted 


Gaynes, 20 05 24 


NA 


Depression screening 
tools in perinatal care 


SR 


23 


20 


1 (5.0%) 


Based on criteria 
from Cochrane 
working group 


Yes 


Noi 


Morse, 20 06 28 


NA 


HADS in patients with 
cancer 


SR 


10 


10 


1 (10.0%) 


No 


NA 


No 


Wancata, 20 06 38 


3.9 


GDS in elderly patients 


MA 


42 


37 


0 (0%) 


Ad hoc§ 


No 


No 


Gilbody, 2007 33 


2.9 


PHQ-9 in primary care and 
hospital settings 


MA 


18 


15 


0 (0%) 


Ad hoc§ 


No 


No 


Mitchell, 2007 34 


2.2 


Short (<5 items) screening 
tools in primary care 
patients 


MA 


12 


10 


3 (30.0%) 


Newcastle-Ottawa 
scale 


Yes 


No 


Thombs, 20 07 3 ' 


2.2 


Depression screening 
tools in acute myocardial 
infarction patients 


SR 


2 


2 


0 (0%) 


Based on AAN 
review guidelines 


Yes 


No 


Wittkampf, 
20 07 3a 


2.1 


PHQ-9 in primary care and 
hospital settings 


MA 


12 


9 


0 (0%) 


QUADAS 


Yes 


No 


Mitchell, 20 08 35 


4.8 


1-2 questions in cancer 
and palliative care 


MA 


10 


10 


0 (0%) 


Based on Pai et 

al 48 


Yes 


No 


Thekkumpurath, 

20 08 29 


2.7 


Depression screening 
tools in palliative care 


SR 


8 


8 


1 (12.5%) 


No 


NA 


No 


Thombs, 20 08 30 


31.7 


Depression screening 
tools in cardiovascular 
care 


SR 


11 


11 


0 (0%) 


USPSTF 


Yes 


No 


Allen, 20 09 23 


1.2 


GDS in older adults or 
veterans in outpatient 
settings 


SR 


4 


4 


0 (0%) 


Ad hoc§ 


No 


No 


Gibson, 20 09 25 


3.7 


EPDS in perinatal care 


SR 


37 


35 


2 (5.7%) 


Based on York 
Centre for Reviews 
and Dissemination 
system 


Yes 


No 


Hewitt, 20 09 38 


6.9 


Depression screening 
tools in perinatal care 


MA 


63 


56 


4(7.1%) 


QUADAS 


Yes 


No 


Kalpakjian, 
20 09 28 


1.4 


Depression screening 
tools in spinal cord injury 
patients 


SR 


4 


4 


0 (0%) 


No 


NA 


No 


Mirkhil, 20 09 27 


3.0 


Depression screening 
tools in patients with pain 
episode 


SR 


4 


4 


0 (0%) 


Diagnostic test 
studies evaluation 
tool 


Yes 


No 


Williams, 20 09 32 


4.7 


Depression screening 
tools in children and 
adolescents 


SR 


9 


9 


0 (0%) 


USPSTF 


Yes 


No 


Mitchell, 201 Of 7 


3.8 


GDS in older primary care 
patients 


MA 


13 


12 


0 (0%) 


No 


NA 


No 



AAN=American Academy of Neurology; EPDS=Edinburgh postnatal depression scale; GDS=geriatric depression scale; HADS=hospital anxiety and depression 
scale; NA=not applicable; PHQ-9=patient health questionnaire-9; QUADAS=quality assessment for diagnostic accuracy studies; USPSTF=US Preventive Services 
Task Force. 

•Impact factor from year systematic review or meta-analysis was published, 
tlncludes quality items related to "representativeness" of samples. 

tin methods authors wrote "We excluded studies that included patients with a known current depressive illness (for whom a screen would not provide new 
information)." Of 23 studies included in systematic review, however, 22 did not exclude patients who were already recognised as depressed or treated for depression. 
Authors of review did not comment on inclusion or exclusion of such patients in results or discussion. 
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Table 1 (continued) 





















Inclusion or 














Cohorts that 




Quality 


exclusion of 














excluded 




assessment 


diagnosed or 




Journal 










diagnosed or 




included 


treated 




impact 


Screening tool, 


Review 


Publications 


Cohorts 


treated 


Method of quality 


spectrum 


patients 


Study 


factor* 


patients/setting 


type 


reviewed 


reviewed 


patients 


assessment 


biasf 


noted 



§Reported extraction of one to two items related to study quality (for example, blinding). 

f Article was epublication ahead of print at time of our search and was subsequently published in 2010. 
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Table : | Cohorts of diagnostic accuracy studies that excluded patients who already had diagnosis of or were receiving treatment for 
depression 


Study (review/s included in) 


Country of original 
study 


Population 


Year(s) data 
collected 


No (%) excluded 


Exclusion criterion 


Arroll, 20 03 42 (Mitchell 34 ) 


New Zealand 


General practice patients 


NR 


47/476 (10%) 


Taking psychotropic drugs 


Arroll, 2005 89 (Mitchell 34 ) 


New Zealand 


General practice patients 


NR 


NR 


Receiving psychotropic drugs 


Aydin, 2004" (Gibson, 25 Hewitt 38 ) 


Turkey 


Postpartum women 


2001 


6/347 (2%) 


Psychiatric treatment history 


Beck, 2005™ (Hewitt 36 ) 


US 


Postpartum women 


NR 


NR 


Diagnosis of depression during current 
pregnancy 


Corson, 2004 43 (Mitchell 34 ) 


US 


Veteran's Affairs primary care 
patients 


2002-3 


762/3466 (22%) 


Mental health appointment in chart 
within past 6 months 


Lloyd-Williams, 2000, 2001 71 72 
(Morse, 23 Thekkumpurath 29 ) 


UK 


Cancer patients in palliative 
care 


NR 


NR 


Currently prescribed antidepressant 
medication 


Vittayanont, 2006 73 (Hewitt 38 ) 


Thailand 


Women 6-8 weeks postpartum 


2003-4 


NR 


Current diagnosis of and receiving 
treatment for psychiatric disorder 


Wickberg, 1996 41 (Gaynes, 24 
Gibson, 25 Hewitt 38 ) 


Sweden 


Women 2-3 months 
postpartum 


NR 


4/1655 (0.2%) 


Already in contact with general 
practitioner or psychiatrist 


NR=not reported. 



No commercial reuse: See rights and reprints http://www.bmj.com/permissions Subscribe: http://www.bmj.com/subscribe 



S/WJ201 1 ;343:d4825 doi: 10.1 1 36/bmj.d4825 



Page 1 1 of 1 1 



RESEARCH 



Figure 



Unique titles and abstracts identified and screened for potential eligibility (n=1216) 













Titles and abstracts excluded (n=1160): 
Not a systematic review or meta-analysis (n=377) 
Not a systematic review or meta-analysis of diagnostic accuracy of 
depression screening tools (783) 







Potentially eligible articles selected for full text review (n=56) 













Articles excluded (n=39): 
Not a systematic review or meta-analysis (n=14) 
Not a systematic review or meta-analysis of diagnostic accuracy of 
depression screening tools (n=24) 

Subset of another included systematic review or meta-analysis (n=l) 







Systematic reviews or meta-analyses included in review (n=17) 



Selection of systematic reviews and meta-analyses of diagnostic accuracy of screening tools for depression 
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